## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.636
## 3rd Qu.:6.000
## Max. :8.000
Most wine’s quality is 6 and range is 3 to 8. The mean of alcohol is 10.42.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
The volatile.acidity distribution is normal. The median is 7.9.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
The volatile.acidity distribution is bimodal with the volatile.acidity peaking at 0.4, 0.5 and 0.6.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
## feature
## 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14
## 132 33 50 30 29 20 24 22 33 30 35 15 27 18 21
## 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
## 19 9 16 22 21 25 33 27 25 51 27 38 20 19 21
## 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44
## 30 30 32 25 24 13 20 19 14 28 29 16 29 15 23
## 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59
## 22 19 18 23 68 20 13 17 14 13 12 8 9 9 8
## 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74
## 9 2 1 10 9 7 14 2 11 4 2 1 1 3 4
## 0.75 0.76 0.78 0.79 1
## 1 3 1 1 1
The distribution for citric acid appears bimodal with the peaking at 0, 0.24, 0.49.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
## feature
## 0.9 1.2 1.3 1.4 1.5 1.6 1.65 1.7 1.75 1.8 1.9 2 2.05 2.1 2.15
## 2 8 5 35 30 58 2 76 2 129 117 156 2 128 2
## 2.2 2.25 2.3 2.35 2.4 2.5 2.55 2.6 2.65 2.7 2.8 2.85 2.9 2.95 3
## 131 1 109 1 86 84 1 79 1 39 49 1 24 1 25
## 3.1 3.2 3.3 3.4 3.45 3.5 3.6 3.65 3.7 3.75 3.8 3.9 4 4.1 4.2
## 7 15 11 15 1 2 8 1 4 1 8 6 11 6 5
## 4.25 4.3 4.4 4.5 4.6 4.65 4.7 4.8 5 5.1 5.15 5.2 5.4 5.5 5.6
## 1 8 4 4 6 2 1 3 1 5 1 3 1 8 6
## 5.7 5.8 5.9 6 6.1 6.2 6.3 6.4 6.55 6.6 6.7 7 7.2 7.3 7.5
## 1 4 3 4 4 3 2 3 2 2 2 1 1 1 1
## 7.8 7.9 8.1 8.3 8.6 8.8 8.9 9 10.7 11 12.9 13.4 13.8 13.9 15.4
## 2 3 2 3 1 2 1 1 1 2 1 1 2 1 2
## 15.5
## 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
## feature
## 0.9 1.2 1.3 1.4 1.5 1.6 1.65 1.7 1.75 1.8 1.9 2 2.05 2.1 2.15
## 2 8 5 35 30 58 2 76 2 129 117 156 2 128 2
## 2.2 2.25 2.3 2.35 2.4 2.5 2.55 2.6 2.65 2.7 2.8 2.85 2.9 2.95 3
## 131 1 109 1 86 84 1 79 1 39 49 1 24 1 25
## 3.1 3.2 3.3 3.4 3.45 3.5 3.6 3.65 3.7 3.75 3.8 3.9 4 4.1 4.2
## 7 15 11 15 1 2 8 1 4 1 8 6 11 6 5
## 4.25 4.3 4.4 4.5 4.6 4.65 4.7 4.8 5 5.1 5.15 5.2 5.4 5.5 5.6
## 1 8 4 4 6 2 1 3 1 5 1 3 1 8 6
## 5.7 5.8 5.9 6 6.1 6.2 6.3 6.4 6.55 6.6 6.7 7 7.2 7.3 7.5
## 1 4 3 4 4 3 2 3 2 2 2 1 1 1 1
## 7.8 7.9 8.1 8.3 8.6 8.8 8.9 9 10.7 11 12.9 13.4 13.8 13.9 15.4
## 2 3 2 3 1 2 1 1 1 2 1 1 2 1 2
## 15.5
## 1
## 90%
## 3.6
Transform the long tail data to better understand the distribution of residual.sugar The distribution for residual.sugar appears to be right skewed. Most of them (90%) residual.sugar less than 3.6 (4.5 g / cm^3 are considered sweet).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
## 95%
## 0.1261
Transform the long tail data to better understand the distribution of chlorides The distribution for chlorides appears to be right skewed. Most of them (95%) chlorides less than 0.1261 .
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
## 95%
## 35
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
## 95%
## 35
Transform the long tail data to better understand the distribution of free.sulfur.dioxide The free.sulfur.dioxide distribution is bimodal with the free.sulfur.dioxide peaking at 7 and 17.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
## 95%
## 112.1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
## 95%
## 112.1
Transform the long tail data to better understand the distribution of total.sulfur.dioxide The total.sulfur.dioxide distribution is normal.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
## feature
## 2.74 2.86 2.87 2.88 2.89 2.9 2.92 2.93 2.94 2.95 2.98 2.99 3 3.01 3.02
## 1 1 1 2 4 1 4 3 4 1 5 2 6 5 8
## 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.1 3.11 3.12 3.13 3.14 3.15 3.16 3.17
## 6 10 8 10 11 11 11 19 9 20 13 21 34 36 27
## 3.18 3.19 3.2 3.21 3.22 3.23 3.24 3.25 3.26 3.27 3.28 3.29 3.3 3.31 3.32
## 30 25 39 36 39 32 29 26 53 35 42 46 57 39 45
## 3.33 3.34 3.35 3.36 3.37 3.38 3.39 3.4 3.41 3.42 3.43 3.44 3.45 3.46 3.47
## 37 43 39 56 37 48 48 37 34 33 17 29 20 22 21
## 3.48 3.49 3.5 3.51 3.52 3.53 3.54 3.55 3.56 3.57 3.58 3.59 3.6 3.61 3.62
## 19 10 14 15 18 17 16 8 11 10 10 8 7 8 4
## 3.63 3.66 3.67 3.68 3.69 3.7 3.71 3.72 3.74 3.75 3.78 3.85 3.9 4.01
## 3 4 3 5 4 1 4 3 1 1 2 1 2 2
The pH distribution is normal.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0040
The distribution for density acid appears to be normal and the different between min and max is only 0.014. ( different between alcohol and water is 0.22)
Ref : https://en.wikipedia.org/wiki/Ethanol
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5500 0.6200 0.6581 0.7300 2.0000
## 95%
## 0.93
Transform the long tail data to better understand the distribution of sulphates. The distribution for sulphates appears to be normal. Most of them (95%) sulphates less than 0.93.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
The distribution for alcohol appears to be right skewed.
## feature
## 3 4 5 6 7 8
## 10 53 681 638 199 18
## [1] 0.9493433
Most of data’s wine qulity is between 5 to 7 (94.9 %). I think I will covert this feature to factor for Multivariate Analysis.
ANS : There are 1599 wine in the data set with 12 features.
## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.636
## 3rd Qu.:6.000
## Max. :8.000
## fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 7.4 0.70 0.00 1.9 0.076
## 2 7.8 0.88 0.00 2.6 0.098
## 3 7.8 0.76 0.04 2.3 0.092
## 4 11.2 0.28 0.56 1.9 0.075
## 5 7.4 0.70 0.00 1.9 0.076
## 6 7.4 0.66 0.00 1.8 0.075
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1 11 34 0.9978 3.51 0.56 9.4
## 2 25 67 0.9968 3.20 0.68 9.8
## 3 15 54 0.9970 3.26 0.65 9.8
## 4 17 60 0.9980 3.16 0.58 9.8
## 5 11 34 0.9978 3.51 0.56 9.4
## 6 13 40 0.9978 3.51 0.56 9.4
## quality
## 1 5
## 2 5
## 3 5
## 4 6
## 5 5
## 6 5
Input variables (based on physicochemical tests): 1. - fixed acidity (tartaric acid - g / dm^3): most acids involved with wine or fixed or nonvolatile (do not evaporate readily) 2. - volatile acidity (acetic acid - g / dm^3): the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste 3. - citric acid (g / dm^3): found in small quantities, citric acid can add ‘freshness’ and flavor to wines 4. - residual sugar (g / dm^3): the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet 5. - chlorides (sodium chloride - g / dm^3): the amount of salt in the wine 6. - free sulfur dioxide (mg / dm^3): the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine 7. - total sulfur dioxide (mg / dm^3): amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine 8. - density (g / cm^3) 9. - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale 10. - sulphates (potassium sulphate - g / dm3): a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant 11. - alcohol (% by volume): the percent alcohol content of the wine
Output variable (based on sensory data): 12. - quality (score between 0 and 10)
ANS: The main feature of interest is wine’s quality. I would like to investigate which variable(s) effect the wine quality.
ANS: I think smell taste touch and addictive content that will effect the wine’s quality so the features that I choose for investigation is :
## fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 7.4 0.70 0.00 1.9 0.076
## 2 7.8 0.88 0.00 2.6 0.098
## 3 7.8 0.76 0.04 2.3 0.092
## 4 11.2 0.28 0.56 1.9 0.075
## 5 7.4 0.70 0.00 1.9 0.076
## 6 7.4 0.66 0.00 1.8 0.075
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1 11 34 0.9978 3.51 0.56 9.4
## 2 25 67 0.9968 3.20 0.68 9.8
## 3 15 54 0.9970 3.26 0.65 9.8
## 4 17 60 0.9980 3.16 0.58 9.8
## 5 11 34 0.9978 3.51 0.56 9.4
## 6 13 40 0.9978 3.51 0.56 9.4
## quality sourness
## 1 5 5.1800
## 2 5 5.4600
## 3 5 5.4784
## 4 6 8.0976
## 5 5 5.1800
## 6 5 5.1800
Yes, I create “sourness” from fixed.acidity and citric.acid that represent the sourness of wine.
ANS: The distribution for citric acid, volatile.acidity and free.sulfur.dioxide appears bimodal and I tidies the data by remove X feature that I am not interested and transform fixed.acidity and citric.acid to sourness for next investigation.
## [1] 1255 840 538 820 834 888 1213 921 90 300 1152 901 1410 425
## [15] 599 1367 661 416 564 950 994 85 1530 590 496 1092 1509 333
## [29] 982 1531 721 1551 344 984 847 1576 753 349 618 454 1159 442
## [43] 92 838 32 302 1313 1505 1125 365 1420 307 226 943 893 736
## [57] 1412 218 1466 1042 1157 1575 700 481 331 1204 183 1436 10 802
## [71] 709 735 730 817 1342 1447 194 40 848 1194 731 827 159 1584
## [85] 1515 592 829 1353 1054 312 151 685 1181 609 1035 535 992 515
## [99] 136 1563
Top correlation values for quality is : 1. alcohol : 0.476 2. volatile.acidity : -0.391 3. sulphates : 0.251 4. citric acid : 0.226
##
## Pearson's product-moment correlation
##
## data: feature and quality
## t = -16.954, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4313210 -0.3482032
## sample estimates:
## cor
## -0.3905578
##
## Pearson's product-moment correlation
##
## data: feature and quality
## t = 9.2875, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1793415 0.2723711
## sample estimates:
## cor
## 0.2263725
## Warning: Removed 58 rows containing non-finite values (stat_boxplot).
##
## Pearson's product-moment correlation
##
## data: feature and quality
## t = 10.38, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2049011 0.2967610
## sample estimates:
## cor
## 0.2513971
##
## Pearson's product-moment correlation
##
## data: feature and quality
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4373540 0.5132081
## sample estimates:
## cor
## 0.4761663
ANS: From the plots and correlation values sulphates, citric acid acidity, alcohol positively relate with quality but volatile acidity negatively relate with quality.
Alcohol sulphates and volatile acidity ’s plot show the different between 3 wine rating of wine very well but citric acid show the different between normal and good wine poorly.
##
## Pearson's product-moment correlation
##
## data: featureX and featureY
## t = -26.489, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5856550 -0.5174902
## sample estimates:
## cor
## -0.5524957
##
## Pearson's product-moment correlation
##
## data: featureX and featureY
## t = 13.159, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2678558 0.3563278
## sample estimates:
## cor
## 0.31277
##
## Pearson's product-moment correlation
##
## data: featureX and featureY
## t = 4.4188, df = 1597, p-value = 1.059e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.06121189 0.15807276
## sample estimates:
## cor
## 0.1099032
ANS: I found that citric acid and volatile acidity very correlate.
citric acid and volatile acidity : -0.5524957
citric acid and sulphates acidity : 0.31277
citric acid and alcohol acidity : 0.1099032
ANS: For feature of interest alcohol percentage has highest corelation value.(0.476)
For every pair of features free.sulfur.dioxide and total.sulfur.dioxide has highest corelation value. (0.66
First I need to prepare alcohol.grade for multivariate plot.
##
## low medium-low medium
## 680 639 280
From the plot show that excellent wine mostly stay on the top left, good wine stay in the middle and normal wine stay in the bottom right.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Wine rating.vs.alcohol.grade.vs.volatile.acidity plot shows that :
excellent wine ratio in alcohol grade “medium” on volatile.acidity range 0.25-0.4 is very high.
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "density" "pH"
## [10] "sulphates" "alcohol" "quality"
## [13] "sourness" "wine_rating" "alcohol.grade"
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Wine rating.vs.alcohol.grade.vs.citric.acid plot shows that excellent wine ratio in alcohol grade “medium” on citric.acid at 0 and 0.4 is very high.
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "density" "pH"
## [10] "sulphates" "alcohol" "quality"
## [13] "sourness" "wine_rating" "alcohol.grade"
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Wine rating.vs.alcohol.grade.vs.total.sulfur.dioxide plot shows that excellent wine ratio in alcohol grade “medium” on total.sulfur.dioxide at 5-30 is very high.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Wine rating.vs.alcohol.grade.vs.sulphates plot shows that excellent wine ratio in alcohol grade “medium” on sulphates range 0.7-0.9 is very high.
Pattern is not noticable here.
From the plots , show that alcohol feature is the highest impact feature.
Wine rating.vs.alcohol.grade.vs.volatile.acidity plot shows that excellent wine ratio in alcohol grade “medium” on volatile.acidity range 0.25-0.4 is very high.
Wine rating.vs.alcohol.grade.vs.citric.acid plot shows that excellent wine ratio in alcohol grade “medium” on citric.acid at 0 and 0.35-0.5 is very high.
Wine rating.vs.alcohol.grade.vs.total.sulfur.dioxide plot shows that excellent wine ratio in alcohol grade “medium” on total.sulfur.dioxide at 5-30 is very high.
Wine rating.vs.alcohol.grade.vs.sulphates plot shows that excellent wine ratio in alcohol grade “medium” on sulphates range 0.7-0.9 is very high.
Win rating.vs.alcohol.grade.vs.sourness.vs.chlorides shows that there hardly to determine wine quality by tongue (chorides and sourness).
It is very surprise that smell(total.sulfur.dioxide) has influnce over the wine rating but taste(chorides and sourness) has not.
## Warning: Removed 23 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).
The distribution for alcohol appears right skewed with the peaking at 0, 0.95, may be since the demand of wine are tend to be lower on higher percent alcohol wine.
Alcohol percentage, Sulphates, Volatile acidity correlate with wine rating positively but Volatile acidity correlate negatively.
The alcohol variance in wine rating good and excellent are much larger than normal.
The citric acid variance in wine rating good and normal are much larger than excellent.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
On low alcohol percentage wine exellent quality rarely to be found mostly is normal rating wine.
On medium-low alcohol percentage wine exellent can be found on low volatile acidity and total sulfur dioxide below 55 but mostly are normal and good rating wine.
On mediumw alcohol percentage wine exellent can be found at high percentage on total sulfur dioxide below 50 and sulphate upper than 0.65 and mostly are exellent and good rating wine.
——
## 'data.frame': 1599 obs. of 15 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## $ sourness : num 5.18 5.46 5.48 8.1 5.18 ...
## $ wine_rating : Ord.factor w/ 3 levels "normal"<"good"<..: 1 1 1 2 1 1 1 3 3 1 ...
## $ alcohol.grade : Ord.factor w/ 3 levels "low"<"medium-low"<..: 1 1 1 1 1 1 1 2 1 2 ...
The data set contain 1599 wine from 2009. I start by understand the variables in data set and try to interpret in term of sense that human can percieve. Surprisingly, I found that the taste sourness and salty has no evidence that they has influence over the quality of wine but the smell (total sulfur dioxide), addictive content (alcohol), voilatile acidity, citric acid and sulphates has influence over it. On low alcohol percentage we hardly found excellent wine_rating but mostly is normal and you can find some of good wine rating if they has total SO2 in range 5-60 and sulphates in range 0.53-0.73 ,On medium-low alcohol percentage wine exellent can be found on low volatile acidity and total sulfur dioxide below 55 but mostly are normal and good rating wine,On mediumw alcohol percentage wine exellent can be found at high percentage on total sulfur dioxide below 50 and sulphate upper than 0.65 and mostly are exellent and good rating wine.
After I research and ask drinkers, I found that there are many significant variables we do not have such as type of grape, where it made from, age of wine.
Anyway drink wine is an art, it is possible that the wine rating come from diffent drinker can be different value.